Detecting Documents with Complaint Character

نویسندگان

  • Sebastian Ebert
  • Benjamin Adrian
چکیده

Recognizing complaint documents as early and as fast as possible is a worthwhile goal for companies. In this paper we present an analysis showing the complexity of this practically relevant problem. Therefore, we define the task and its challenges and investigate statistical methods for automated Complaint Detection in incoming text documents. Two different approaches for handling complaint documents are presented. First, we analyze various term weightings in a standard bag-of-words approach. Second, we show the effect of feature engineering techniques known from Natural Language Processing. The results on four German and one English corpora show that already a linear classifier achieves valuable results and is competitive to more sophisticated methods in most cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spatial Semantic Scan: Detecting Subtle, Spatially Localized Events in Text Streams

Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have shortcomings that make them unsuitable for rapid detection of locally emerging events on massive text streams. We describe Spatially Compact Semantic Scan (SCSS) that has been developed specifically to overcome the shortcomings of current methods in detecting new spati...

متن کامل

A New Apporach to Optical Character Recognition Based on Text Recognition in Ocr

Optical Character Recognition (OCR) is a technology that enable of you to convert different types of documents, such as scanned paper documents, either hand written or machine printed script, PDF files or images captured by a digital camera into editable and searchable data. Our intention is to build an automatic text localization and extraction system which is able to accept different types of...

متن کامل

Detecting Changes in Chief Complaint Word Count: Effects on Syndromic Surveillance

Introduction The New York City (NYC) Department of Health and Mental Hygiene (DOHMH) receives daily ED data from 49 of NYC’s 52 hospitals, representing approximately 95% of ED visits citywide. Chief complaint (CC) is categorized into syndrome groupings using text recognition of symptom key-words and phrases. Hospitals are not required to notify the DOHMH of any changes to procedures or health i...

متن کامل

Inversion Detection in Text Document Images

OCR makes it possible for the user to edit or search the document’s contents. In this paper we describe a special water fill technique for detecting the upside down text document. Each character has a upside and downside filling capacities. A character may have two sides or one side filling capacity or zero filling capacity. The total upside and downside capacities for the scanned page calculat...

متن کامل

Automatic Text Recognition from Raster Maps

Text labels in raster maps provide valuable geospatial information by associating geospatial locations with geographical names. Although present commercial optical character recognition (OCR) products can achieve a high recognition rate on documents, text recognition on raster maps is still challenging due to the varying text orientations and the overlapping between text labels. This paper pres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013